Search CORE

59 research outputs found

Efficient Algorithms for Large Prime Characteristic Fields and Their Application to Bilinear Pairings

Author: Patrick Longa
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 06/10/2022
Field of study

We propose a novel approach that generalizes interleaved modular multiplication algorithms to the computation of sums of products over large prime fields. This operation has widespread use and is at the core of many cryptographic applications. The method reformulates the widely used lazy reduction technique, crucially avoiding the need for storage and computation of double-precision operations. Moreover, it can be easily adapted to the different methods that exist to compute modular multiplication, producing algorithms that are significantly more efficient and memory-friendly. We showcase the performance of the proposed approach in the computation of multiplication over an extension field

\mathbb{F}_{p^k}

, and demonstrate its impact with a record-breaking implementation for bilinear pairings: a full optimal ate pairing over the popular BLS12-381 curve is computed in under half a millisecond on a 3.2GHz Intel Coffee Lake processor, which is about

1.40\times

faster than the state-of-the-art

Cryptology ePrint Archive

FourQNEON: Faster Elliptic Curve Scalar Multiplications on ARM Processors

Author: Patrick Longa
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 15/07/2016
Field of study

We present a high-speed, high-security implementation of the recently proposed elliptic curve FourQ (ASIACRYPT 2015) for 32-bit ARM processors with NEON support. Exploiting the versatile and compact arithmetic of this curve, we design a vectorized implementation that achieves high-performance across a large variety of ARM platforms. Our software is fully protected against timing and cache attacks, and showcases the impressive speed of FourQ when compared with other curve-based alternatives. For example, one single variable-base scalar multiplication is computed in about 235,000 Cortex-A8 cycles or 132,000 Cortex-A15 cycles which, compared to the results of the fastest genus 2 Kummer and Curve25519 implementations on the same platforms, offer speedups between 1.3x-1.7x and between 2.1x-2.4x, respectively. In comparison with the NIST standard curve K-283, we achieve speedups above 4x and 5.5x

Cryptology ePrint Archive

A Note on Post-Quantum Authenticated Key Exchange from Supersingular Isogenies

Author: Patrick Longa
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 20/03/2018
Field of study

In this work, we study several post-quantum authenticated key exchange protocols in the setting of supersingular isogenies. Leveraging the design of the well-studied schemes by Krawczyk (2003), Boyd et al. (2008), Fujioka et al. (2013), Krawczyk and Wee (2015), and others, we show how to use the Supersingular Isogeny Diffie-Hellman (SIDH) and Supersingular Isogeny Key Encapsulation (SIKE) protocols as basic building blocks to construct efficient and flexible authenticated key exchange schemes featuring different functionalities and levels of security. This note is also intended to be a ``gentle\u27\u27 introduction to supersingular isogeny based cryptography, and its most relevant constructions, for protocol designers and cryptographers

Cryptology ePrint Archive

High-Speed Elliptic Curve and Pairing-Based Cryptography

Author: Longa Patrick
Publication venue: 'University of Waterloo'
Publication date: 05/04/2011
Field of study

Elliptic Curve Cryptography (ECC), independently proposed by Miller [Mil86] and Koblitz [Kob87] in mid 80’s, is finding momentum to consolidate its status as the public-key system of choice in a wide range of applications and to further expand this position to settings traditionally occupied by RSA and DL-based systems. The non-existence of known subexponential attacks on this cryptosystem directly translates to shorter keylengths for a given security level and, consequently, has led to implementations with better bandwidth usage, reduced power and memory requirements, and higher speeds. Moreover, the dramatic entry of pairing-based cryptosystems defined on elliptic curves at the beginning of the new millennium has opened the possibility of a plethora of innovative applications, solving in some cases longstanding problems in cryptography. Nevertheless, public-key cryptography (PKC) is still relatively expensive in comparison with its symmetric-key counterpart and it remains an open challenge to reduce further the computing cost of the most time-consuming PKC primitives to guarantee their adoption for secure communication in commercial and Internet-based applications. The latter is especially true for pairing computations. Thus, it is of paramount importance to research methods which permit the efficient realization of Elliptic Curve and Pairing-based Cryptography on the several new platforms and applications. This thesis deals with efficient methods and explicit formulas for computing elliptic curve scalar multiplication and pairings over fields of large prime characteristic with the objective of enabling the realization of software implementations at very high speeds. To achieve this main goal in the case of elliptic curves, we accomplish the following tasks: identify the elliptic curve settings with the fastest arithmetic; accelerate the precomputation stage in the scalar multiplication; study number representations and scalar multiplication algorithms for speeding up the evaluation stage; identify most efficient field arithmetic algorithms and optimize them; analyze the architecture of the targeted platforms for maximizing the performance of ECC operations; identify most efficient coordinate systems and optimize explicit formulas; and realize implementations on x86-64 processors with an optimal algorithmic selection among all studied cases. In the case of pairings, the following tasks are accomplished: accelerate tower and curve arithmetic; identify most efficient tower and field arithmetic algorithms and optimize them; identify the curve setting with the fastest arithmetic and optimize it; identify state-of-the-art techniques for the Miller loop and final exponentiation; and realize an implementation on x86-64 processors with optimal algorithmic selection. The most outstanding contributions that have been achieved with the methodologies above in this thesis can be summarized as follows: • Two novel precomputation schemes are introduced and shown to achieve the lowest costs in the literature for different curve forms and scalar multiplication primitives. The detailed cost formulas of the schemes are derived for most relevant scenarios. • A new methodology based on the operation cost per bit to devise highly optimized and compact multibase algorithms is proposed. Derived multibase chains using bases {2,3} and {2,3,5} are shown to achieve the lowest theoretical costs for scalar multiplication on certain curve forms and for scenarios with and without precomputations. In addition, the zero and nonzero density formulas of the original (width-w) multibase NAF method are derived by using Markov chains. The application of “fractional” windows to the multibase method is described together with the derivation of the corresponding density formulas. • Incomplete reduction and branchless arithmetic techniques are optimally combined for devising high-performance field arithmetic. Efficient algorithms for “small” modular operations using suitably chosen pseudo-Mersenne primes are carefully analyzed and optimized for incomplete reduction. • Data dependencies between contiguous field operations are discovered to be a source of performance degradation on x86-64 processors. Three techniques for reducing the number of potential pipeline stalls due to these dependencies are proposed: field arithmetic scheduling, merging of point operations and merging of field operations. • Explicit formulas for two relevant cases, namely Weierstrass and Twisted Edwards curves over and , are carefully optimized employing incomplete reduction, minimal number of operations and reduced number of data dependencies between contiguous field operations. • Best algorithms for the field, point and scalar arithmetic, studied or proposed in this thesis, are brought together to realize four high-speed implementations on x86-64 processors at the 128-bit security level. Presented results set new speed records for elliptic curve scalar multiplication and introduce up to 34% of cost reduction in comparison with the best previous results in the literature. • A generalized lazy reduction technique that enables the elimination of up to 32% of modular reductions in the pairing computation is proposed. Further, a methodology that keeps intermediate results under Montgomery reduction boundaries maximizing operations without carry checks is introduced. Optimized formulas for the popular tower are explicitly stated and a detailed operation count that permits to determine the theoretical cost improvement attainable with the proposed method is carried out for the case of an optimal ate pairing on a Barreto-Naehrig (BN) curve at the 128-bit security level. • Best algorithms for the different stages of the pairing computation, including the proposed techniques and optimizations, are brought together to realize a high-speed implementation at the 128-bit security level. Presented results on x86-64 processors set new speed records for pairings, introducing up to 34% of cost reduction in comparison with the best published result. From a general viewpoint, the proposed methods and optimized formulas have a practical impact in the performance of cryptographic protocols based on elliptic curves and pairings in a wide range of applications. In particular, the introduced implementations represent a direct and significant improvement that may be exploited in performance-dominated applications such as high-demand Web servers in which millions of secure transactions need to be generated

CiteSeerX

University of Waterloo's Institutional Repository

Fast Multibase Methods and Other Several Optimizations for Elliptic Curve Scalar Multiplication

Author: Catherine Gebotys
Patrick Longa
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 20/04/2009
Field of study

Recently, the new Multibase Non-Adjacent Form (mbNAF) method was introduced and shown to speed up the execution of the scalar multiplication with an efficient use of multiple bases to represent the scalar. In this work, we first optimize the previous method using fractional windows, and then introduce further improvements to achieve additional cost reductions. Moreover, we present new improvements in the point operation formulae. Specifically, we reduce further the cost of composite operations such as quintupling and septupling of a point, which are relevant for the speed up of multibase methods in general. Remarkably, our tests show that, in the case of standard elliptic curves, the refined mbNAF method can be as efficient as Window-w NAF using an optimal fractional window size. Thus, this is the first published method that does not require precomputations to achieve comparable efficiency to the standard window-based NAF method using precomputations. On other highly efficient curves as Jacobi quartics and Edwards curves, our tests show that the refined mbNAF currently attains the highest performance for both scenarios using precomputations and those without precomputations

Cryptology ePrint Archive

Speeding up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography

Author: Michael Naehrig
Patrick Longa
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 10/11/2016
Field of study

The Number Theoretic Transform (NTT) provides efficient algorithms for cyclic and nega-cyclic convolutions, which have many applications in computer arithmetic, e.g., for multiplying large integers and large degree polynomials. It is commonly used in cryptographic schemes that are based on the hardness of the Ring Learning With Errors (R-LWE) problem to efficiently implement modular polynomial multiplication. We present a new modular reduction technique that is tailored for the special moduli required by the NTT. Based on this reduction, we speed up the NTT and propose faster, multi-purpose algorithms. We present two implementations of these algorithms: a portable C implementation and a high-speed implementation using assembly with AVX2 instructions. To demonstrate the improved efficiency in an application example, we benchmarked the algorithms in the context of the R-LWE key exchange protocol that has recently been proposed by Alkim, Ducas, Pöppelmann and Schwabe. In this case, our C and assembly implementations compute the full key exchange 1.49 and 1.13 times faster, respectively. These results are achieved with full protection against timing attacks

Cryptology ePrint Archive

Four-Dimensional Gallant-Lambert-Vanstone Scalar Multiplication

Author: Francesco Sica
Patrick Longa
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 13/09/2012
Field of study

The GLV method of Gallant, Lambert and Vanstone~(CRYPTO 2001) computes any multiple

kP

of a point

P

of prime order

n

lying on an elliptic curve with a low-degree endomorphism

\Phi

(called GLV curve) over

\mathbb{F}_p

kP = k_1P + k_2\Phi(P)

, with

\max\{|k_1|,|k_2|\}\leq C_1\sqrt n

for some explicit constant

C_1>0

. Recently, Galbraith, Lin and Scott (EUROCRYPT 2009) extended this method to all curves over

\mathbb{F}_{p^2}

which are twists of curves defined over

\mathbb{F}_p

. We show in this work how to merge the two approaches in order to get, for twists of any GLV curve over

\mathbb{F}_{p^2}

, a four-dimensional decomposition together with fast endomorphisms

\Phi, \Psi

over

\mathbb{F}_{p^2}

acting on the group generated by a point

P

of prime order

n

, resulting in a proven decomposition for any scalar

k\in[1,n]

given by

kP=k_1P+ k_2\Phi(P)+ k_3\Psi(P) + k_4\Psi\Phi(P)

, with

\max_i (|k_i|)0

. Remarkably, taking the best

C_1, C_2

, we obtain

C_2/C_1<412

, independently of the curve, ensuring in theory an almost constant relative speedup. In practice, our experiments reveal that the use of the merged GLV-GLS approach supports a scalar multiplication that runs up to 50\% faster than the original GLV method. We then improve this performance even further by exploiting the Twisted Edwards model and show that curves originally slower may become extremely efficient on this model. In addition, we analyze the performance of the method on a multicore setting and describe how to efficiently protect GLV-based scalar multiplication against several side-channel attacks. Our implementations improve the state-of-the-art performance of point multiplication for a variety of scenarios including side-channel protected and unprotected cases with sequential and multicore execution

Cryptology ePrint Archive

FourQ: four-dimensional decompositions on a Q-curve over the Mersenne prime

Author: Craig Costello
Patrick Longa
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 13/08/2016
Field of study

We introduce FourQ, a high-security, high-performance elliptic curve that targets the 128-bit security level. At the highest arithmetic level, cryptographic scalar multiplications on FourQ can use a four-dimensional Gallant-Lambert-Vanstone decomposition to minimize the total number of elliptic curve group operations. At the group arithmetic level, FourQ admits the use of extended twisted Edwards coordinates and can therefore exploit the fastest known elliptic curve addition formulas over large prime characteristic fields. Finally, at the finite field level, arithmetic is performed modulo the extremely fast Mersenne prime

p=2^{127}-1

. We show that this powerful combination facilitates scalar multiplications that are significantly faster than all prior works. On Intel\u27s Broadwell, Haswell, Ivy Bridge and Sandy Bridge architectures, our software computes a variable-base scalar multiplication in 50,000, 56,000, 69,000 cycles and 72,000 cycles, respectively; and, on the same platforms, our software computes a Diffie-Hellman shared secret in 80,000, 88,000, 104,000 cycles and 112,000 cycles, respectively. These results show that, in practice, FourQ is around four to five times faster than the original NIST P-256 curve and between two and three times faster than curves that are currently under consideration as NIST alternatives, such as Curve25519

CiteSeerX

Cryptology ePrint Archive

The Cost to Break SIKE: A Comparative Hardware-Based Analysis with AES and SHA-3

Author: Jakub Szefer
Patrick Longa
Wen Wang
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 29/10/2021
Field of study

This work presents a detailed study of the classical security of the post-quantum supersingular isogeny key encapsulation (SIKE) protocol using a realistic budget-based cost model that considers the actual computing and memory costs that are needed for cryptanalysis. In this effort, we design especially-tailored hardware accelerators for the time-critical multiplication and isogeny computations that we use to model an ASIC-powered instance of the van Oorschot-Wiener (vOW) parallel collision search algorithm. We then extend the analysis to AES and SHA-3 in the context of the NIST post-quantum cryptography standardization process to carry out a parameter analysis based on our cost model. This analysis, together with the state-of-the-art quantum security analysis of SIKE, indicates that the current SIKE parameters offer higher practical security than currently believed, closing an open issue on the suitability of the parameters to match NIST\u27s security levels. In addition, we explore the possibility of using significantly smaller primes to enable more efficient and compact implementations with reduced bandwidth. Our improved cost model and analysis can be applied to other cryptographic settings and primitives, and can have implications for other post-quantum candidates in the NIST process

Cryptology ePrint Archive

Implementing 4-Dimensional GLV Method on GLS Elliptic Curves with j-Invariant 0

Author: Maozhi Xu
Patrick Longa
Zhi Hu
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 17/10/2011
Field of study

The Gallant-Lambert-Vanstone (GLV) method is a very efficient technique for accelerating point multiplication on elliptic curves with efficiently computable endomorphisms. Galbraith, Lin and Scott (J. Cryptol. 24(3), 446-469 (2011)) showed that point multiplication exploiting the 2-dimensional GLV method on a large class of curves over GF(p^2) was faster than the standard method on general elliptic curves over GF(p), and left as an open problem to study the case of 4-dimensional GLV on special curves (e.g., j(E) = 0) over GF(p^2). We study the above problem in this paper. We show how to get the 4-dimensional GLV decomposition with proper decomposed coefficients, and thus reduce the number of doublings for point multiplication on these curves to only a quarter. The resulting implementation shows that the 4-dimensional GLV method on a GLS curve runs in about 0.78 the time of the 2-dimensional GLV method on the same curve and in between 0.78-0.87 the time of the 2-dimensional GLV method using the standard method over GF(p). In particular, our implementation reduces by up to 27% the time of the previously fastest implementation of point multiplication on x86-64 processors due to Longa and Gebotys (CHES2010)

Cryptology ePrint Archive